Skip to main content

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: marcus lucero Date: Oct 12, 2007 5:11pm
Forum: texts Subject: Re: API?

As Collins said above, each item in the texts archives does in fact receive a unique identifier which inturn creates a persistent URL. You can, and I am not an engineer, is point to that unique url.

(e.g. which will always stay the same and never be replaced by other files)

Others have 'scrapped" our database from outside but have never really shared their techniques.


Reply to this post
Reply [edit]

Poster: EmilPer Date: Oct 20, 2007 12:51am
Forum: texts Subject: Re: API?

"Others ... have never really shared their techniques."

This could be because it's not that difficult to scrap the public domain books archive, and in consequence there is not much code or technique to share.

It could also be because does not say clearly what they allow and what they do not allow. For example, Project Gutenberg says clearly what can be done with their content, so there are many PG readers out there that read their text databases, process the book text, split it into pages, reformat etc. does not, or not in a place that's easy to find, state what can be done and what cannot be done with the texts they host.

"Access to the Archive’s Collections is provided at no cost to you and is granted for scholarship and research purposes only." is very ambiguous. Would anyone spend a few hundred man-hours to write code to search, download marc/dc/meta files, get the fulltext files, index the text, cross reference it, identify correlations, generate new searches etc. and then share the code only to find out that "no, that's not allowed" ? Most likely s/he would leach as much as possible, share the code that does only the leaching, and then claim s/he is writing a better spelling checker and needs raw data.

Ambiguous "Terms of Use" and " questions or comments regarding these terms ... at" means "don't bother unless you can afford to pay a lawyer full time".